MULTICS TECHNICAL BULLETIN MTB- 152 page 1 

ToJ Oistr iDut ion 

From s T . H, Van Vl eck 

Date: 01/13/75 

Subject: Unattenaec Qperation of Multics 



Several Multics instaHations have bean atteiiDting to run 
their systems without an operator present. AJ though this is not 
an advertisea feature of Multics* these sites have done pretty 
we] I by leaving the system running when the operators go off 
shift* with the unaerstanaing tnat the system will be avaiJaole 
without backup or tape mounting facilitiest ana that if the 
system crashes it rtill stay cJown until an operator comes in. 
This memoranaum aescribes changes to fr>e supervisor ar.c suoport 
oroctQures which woul a make unattenaed operation easier and more 
reliable. With some very simple changes* unattended moce becomes 
usable almost immeoistely; later changes which make acjitionaJ 
improvements in unattended mooe are also describea. 



aUTOMATIC RESTART AFTER A SYSTEH CRASH 

When MuJtics crashes* the operator usually brings the system 
back UP again. Almost ail of the steos which the operator takes 
can be automated. These steps are: 

1. Determine tnat the system has crashed. 

2. Invoke the "CRASH" runcom. 

3. Invoke the LD355 ana BOOT commands. 

k* Reply "startup" to the Initializer process. 

5. Start tne 3dCkuo and 10 daemons. 

Existing facilities a-^e sufficient tor automating many of these 
steps. Only minor changes need be made to fix the others. 

Crashes will be detected when ths system returns to 805 from 
Multics operation. There are some cases of system crash such as 
power failure* idle loop* or hung Initializer process which do 
not return to BOS: these cases are no+ hancleo oy the new 
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scheme, but are fortunately fairly rare. There are also cases in 
which the system returns to BOS but has not crashed, for instance 
after a normaJ shutQown, or in response to the system.contro l_ 
•.t,„^.. ^/xmm^rv^f mnn i f i ra t i rtnc; vd 1 I I oe made to the suoervisor and 

to BOS to cetect these cases. 

We will set up one wora of storage in the BOS toehola for 
intercommunication between Multics ana BOS. This word will oe 
used as a set of 36 switches which will oe set either by 3 305 
command or by a Multics privilegeo cali. One of these^^sw it che:» 
will be reserves to mean "automatic reooot moce is on"; this 
switch will be turnea ON by a Multics command, and will be turned 
OFF by Multics command, BOS command, or by automatic crash-looo 
detecting coce in Multics startup. Another switch will mean "the 
system crashed": this switch will be set ON at boot time and 
reset by normal shutaown ana calls to pmutSbos_ano_re turn . 

t^nen the system returns to BOS, any runcom which contained 
the BOOT command will continue on to its next statemert. Tnis 
statement will be a test of the "crashed" switch ir-. The stannarj 
runcom, to discover whether the system crashed or shut Jo>^n 
norma! I y . 

It is already possible to distinguish between the case of a 
Return-To-BOS causea by the Hultics software ana one causeo oy an 
XED i+CGO from the processor panel. Since the latter case implies 
that someone is present in the computer room, it shoulu not oe 
assumeo that this action is a "crash." Similarly, the setting of 
the "crashed" switch will allow BOS t o d ist inguish b5:tween calls 
to pmutj-bos and pmut$oos_and_re turn? the secona call is made 
only in respons-- to manual intervention and so shoulJ not be 
detfrcted as a crash. 

Some changes coul a be maae in the scheduler to aetect iole 
loops, and it shoulc be oossiDle to set uo a detector for tne 
cases i^ which tne Initializer process is hung: out these are 
refinements which can oe proposed separately. A special command 
to cause tne system to crash should also oe construct'?d which can 
be used by trustworthy installation personnel in cases where they 
notice that tne system shoulo oe restarted. (Special access 
oriwileye will be orovided on the gate wnicn performs tnis 
f unc t ion. ) 



When the ooerato~ aiscovers a system crash, he usually types 
CRASH immediately, unless it is obvious that a haroware failure 
will prevent recovery from worKing. The usual recovery 

oroc.-Jures consist of the following commands: 

FOUMP (unless inhibited) 

F 3 5 5 



flULTICS TECHNICAL BULLETIN HTD- 152 page 3 



BLAST CRASH 

ESD 

SALV (if necessary) 

LD355 

BOOT 

The actual runcoms will of course be much more comp I i cat ecif in 
the style of MOSN-zf+i. Various switches will be adaea to allow 
the operator to request that the systefn pause before Dootiri:^ and 
salvaging* after crashingt and so on. Simiiarly» mooes to 
suppress crash dumps altogether or to take printer anc/or tape 
dumps in adcition to the FOUMP will be definec. 

In order to prevent the system from cycling in a tight loop 
of boot - crash immediately - recover - boot* several small 
modifications will be made to the scheme so far outlinec, so that 
the default after an automatic reboot is to turn off automatic 
mooet until the answering service determines that the system is 
truly up, ana is '^ot in 5 crash loop. The acvantage to this 
approach is that the full Multics environment is availaDlt for 
programs which try to detect the loop. Automatic rebooting will 
be discontinued after N crashes after automatic mode was entered. 
It will also be turned off if there are more than M crashes in K 
minutess ana admin commands will be added to the Initializer to 
allow the setting of these parafneters. 



The restarting of the system will be done as a part of the 
rcrco"^ loop wnich the system will be tr-appec in, unless a "■ninual 
i nte rvent ior" switch is set or automatic noae has oeen tjrnej 
off. Tape-positioning operations may have to ne inse'^Tea in tne 
runcom to position tne unified bootload tape (set HTB-l3j). The 
drive on which the unified tape is mounted must be uravailaole 
for use by the supervisor tape- ass i gment coce, so that the tape 
can remain permanently mountea. 



When the Initializer has completed hardcore initialization, 
it enters the ring-i environment and waits for an input command 
from the master console which tells it whether to do a cola boot, 
enter 30S, or start up the system. The system_s tartup_ program 
must be modified to check the switches in the toeholo ana to 
manufacture a "startup" command if "automatic reboot" .•Doae is ON. 
The ring-1 enviroTment will then proceed to call 
system_cont rol_$startup for answering service initialization. 
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Most instai I at ior>5 hava their sy st em_s tart_up.ec arranged so 
4.^,_4. *t,,, ^>-j-im^r,cr rtn r, n "t 3 1 1 1 o fi! ,^ t i c 3 1 Iv start runnina* Dut only 
come to commanc levelt so that ooerators may moJify oarameters or 
choose not to run the daemons at all. 3ut when no operator is 
present, the daemons will then hanj, A simple solution to this 
probleip is to have running unattenaed imply running without any 
backup or 10 aaemons. Alternatively, the standara action taken 
by system_start_JO.ec could oe to include the startup of the 
aaemons. Neither of these choices is attractive. Tt\e best 
solution is to make an active function available which will 
return "true" when the system is in "automatic reboot" mode, so 
that the sy5tem_star t_up. ec can take a default action when the 
system is unattendeo, and otherwise wait for the operator. 



FACILITIES AVAILABLE IN UNATTENDEO MODE 

The other autias of the system operator relate to the 
tenaing of the printer and the mounting of tapes. If we provide 
a oer-system switch which tells whether tne system is unattendea, 
it will be fairly easy to cause the system software to reject all 
requests which cannot oe handled automatically, and to suppress 
messages which have no function except to prompt the operator. 
For example, user tape-mount requests which cannot oe satisfied 
without human intervention should oe rejected by the 
tape-management software. Similarly, there seems to be little 
harm in atrempting to run the 10 daemoT, out when the printer 
paper Jams or runs out, the message describing this situation 
neec be printec cnl y once; the device driver shoulo then wait 
quietly for someone to fix its problem. 

set, user programs shoula be able 
to dete-mine that the-e is no 

' ■ jD I e, if this is known. 




3ACKUP 

The major problen with running the system in unattended tioae 
is that the incremental backup daemon will eventually fill its 

outpLt backup cumo tape. When it uoes, it will request another 
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taoe number from the operator. Tht reply to this question could 
have been Issued in advance, by 3 call to "rtoly" in 
system_start_up. ec but the dsemon will then take this string anJ 
call the tape software to inava a tape mounted. Our suggestion so 
far has been to refuse to mount the tape. 

It will not be difficult to change to the t ao-- ass i gnment 
package to make it work aifferently for the Backup daemon if the 
system is unattended. When the operator is placing the system 
into unattended mode, he will load all available drives with 
blank backup tapes* ana instruct the tape-ass igment software 
assign these tapes one by one to the daemon as nt-eaed. This 
requires that the tape mount software perform a slightly 
different action when assigning a tape, that the tape DIH not 
dump a tape at mount time if it is already mounted, and also that 
the tapes which are not at load point when Multics is booted 
should be (file marked ana) unjoadec. 



CONFIGURATION 
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crashes and successfully reboots is 
There will be cases, however, when the 
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GENERALIZATIONS OF THIS SCHEME 
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The unattended mooes should probably oe implementei^ 30 that, 
when a problem occurs, it will be convenient for someone to come 
in, fix the problem, and let the system continue unattended. For 
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example, if a orihter runs out of oapert and a system programmer 
naooens to pass the machine, he should be aole to insert a fresh 
box of paper ana allow the oaemon to continue printing, without 



I 'zer conimanoS' 
unatteded mode with respect to tapes. 
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